home *** CD-ROM | disk | FTP | other *** search
- Path: millenium.texas.net!gcherer
- From: gcherer@millenium.texas.net (GT Cherer)
- Newsgroups: comp.lang.c
- Subject: Huge file performance question
- Date: 24 Feb 1996 18:56:11 GMT
- Organization: Texas Networking, Inc.
- Message-ID: <4gnn0b$nja@nntp.texas.net>
- NNTP-Posting-Host: millenium.texas.net
- X-Newsreader: TIN [version 1.2 PL2]
-
- so, the task is to take 9 100meg+ files and split them into 200-600
- little files. sounds like a futuristic prison scenario.....
-
- the records are 600 bytes or better. the little files are based on a 2
- byte key. the hp this runs on can't handle sorting the big files, and the
- fopen_max (20-60 open file maximum) constrains having 200-600 files open
- at once.
-
- my first guess was to make an array of the key and an ftell position,
- sort that array, then fseek into each of the big files. that licks the
- big-sort problem and the max files open problem, but it sure seems like a
- lot of overhead.
-
- this sounds like the kind of nut that has been cracked a bazillion times
- before. how does one approach a problem involving over-max sort size and
- over-max open files especially in light of performance??
-
- this is a unix system, so using split to break up the files is an option,
- but i'm not sure it speeds things up (split/sort vs ftell/sort/fseek) and
- it adds a disk-space component (original file/split files/output files).
-
- i am in a conundrum and sure could use an experienced hand to point the
- way, not so much to solve the problem, but rather how to go about setting
- and evaluating criteria.
-
-
- --
- G.T. Jeff Cherer gcherer@texas.net
- Voice: 210-532-7524 SnailMail: 1132 Vanderbilt St. 78210
- "Rolling rocks down a 10,000 ft mountain, they can't be stopped.
- Not because of the rock, but because of the mountain.'
- Du Mu, 9th century
-